15. Exercise: Improvement, Class Imbalance
Model Improvement: Accounting for Class Imbalance
We have a model that is tuned to get a higher recall, which aims to reduce the number of false negatives . Earlier, we discussed how class imbalance may actually bias our model towards predicting that all transactions are valid, resulting in higher false negatives and true negatives . It stands to reason that this model could be further improved if we account for this imbalance!
To account for class imbalance during training of a binary classifier,
LinearLearner
offers the hyperparameter,
positive_example_weight_mult
, which is the weight assigned to positive (fraudulent data) examples when training a binary classifier. The weight of negative examples (valid data) is fixed at 1.
From the hyperparameter documentation on positive_example_weight_mult, it reads:
"If you want the algorithm to choose a weight so that errors in classifying negative vs. positive examples have equal impact on training loss, specify
balanced
."
In the main exercise notebook, your exercises from defining to deploying an improved model looks as follows:
EXERCISE: Create a LinearLearner with a
positive_example_weight_mult
parameter
In addition to tuning a model for higher recall, you should add a parameter that helps account for class imbalance.
# instantiate a LinearLearner
# include params for tuning for higher recall
# *and* account for class imbalance in training data
linear_balanced = None
EXERCISE: Train the balanced estimator
Fit the new, balanced estimator on the formatted training data.
%%time
# train the estimator on formatted training data
EXERCISE: Deploy and evaluate the balanced estimator
Deploy the balanced predictor and evaluate it. Do the results match with your expectations?
%%time
# deploy and create a predictor
balanced_predictor = None
An important question here, when evaluating your model, is: Do the results match with your expectations? Much like in a scientific experiment it is good practice to start with a hypothesis that drives your idea for improving a model; if the trained model reacts in a different way than you expect (i.e. the model metrics are worse), it is worth revisiting your assumptions and approach.
Try to complete all these tasks, and if you get stuck, you can reference the solution video, next!
Shutting Down the Endpoint
Remember to delete your deployed, model endpoint after you finish with evaluation.